Decoupling the Data Geometry from the Parameter Geometry for Stochastic Gradients

نویسنده

  • YANN LECUN
چکیده

Large-scale learning problems require algorithms that scale benignly with respect to the size of the dataset and the number of parameters to be trained; leading numerous practitioners to favor the classic stochastic gradient descent (SGD [1, 2, 3]) over more sophisticated methods. Besides its fast convergence, SGD has been observed to sometimes lead to signi cantly better generalization performance than batch gradient descent. SGD is also quicker than batch methods in adapting to non-stationary data distributions. Its Achilles heel are the inherently sequential updates, making it very di cult to parallelize across many machines; which is clashing with the goals of large-scale learning. Our goals here are twofold. On the theoretical level, we want to gain a fuller understanding of how the dynamics of stochastic updates contrast with those of batch updates, and how they are a ected by the conditioning of energy surfaces, the presence of local optima, and the properties of the data distribution. On the practical level, we want to use this knowledge to design more e cient mini-batch SGD variants (which are parallelizable), together with robust settings for their hyper-parameters. The study of stochastic gradient methods dates back over six decades [1, 2, 3, 4, 5, 6, 7], but to our knowledge, the present questions remain understudied. The most similar viewpoint is found in approaches based on natural gradients and information geometry [8, 9, 10].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic estimation of regularization parameter by active constraint balancing method for 3D inversion of gravity data

Gravity data inversion is one of the important steps in the interpretation of practical gravity data. The inversion result can be obtained by minimization of the Tikhonov objective function. The determination of an optimal regularization parameter is highly important in the gravity data inversion. In this work, an attempt was made to use the active constrain balancing (ACB) method to select the...

متن کامل

Investigating the Effect of Joint Geometry of the Gas Tungsten Arc Welding Process on the Residual Stress and Distortion using the Finite Element Method

Although a few models have been proposed for 3D simulation of different welding processes, 2D models are still more effective in design goals, thus more popular due to the short-time analysis. In this research, replacing "time" by the "third dimension of place", the gas tungsten arc welding process was simulated by the finite element method in two dimensions and in a short time with acceptable ...

متن کامل

Discrimination of Geological Top-Formations by their Morphology through SAR Images and via Fractal Geometry implementation in IEM Backscattering Model(Case Study: Zagros Thrust Belt)

Morphological discrimination of geological top-formations is the supplemental procedure of geological mapping; so in situ measurements to register geomorphological data are unavoidable; though due to the impassable and fault cliffs field operations to visit all areas within a geological map is almost impossible. Microwave or radar remote sensing, via synthetic aperture radar (SAR) images is cap...

متن کامل

Investigating the Effects of Inlet Conditions and Nozzle Geometry on the Performance of Supersonic Separator Used for Natural Gas Dehumidification

Supersonic separators have found extensive applications in dehumidification of natural gases since 2003. Unlike previous studies, which have investigated the inlet conditions and nozzle geometry of supersonic separators for pure fluids, the present study employed a combination of momentum, heat, and mass transfer equations along with Virial equation of state (EOS) to inspect the effect of inlet...

متن کامل

A Computational Study of the Effects of Combustion Chamber Geometries on Combustion Process and Emission in a DI Diesel Engine

A computational study aiming to investigate the effect of combustion chamber geometry on combustion process and emission has been carried out in a direct injection diesel engine. The combustion process and emission of three different combustion chamber geometries were considered, and combustion process behaviors such as variation of mean pressure, velocity, heat release rate, emission productio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012